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Introduction 

Since ARMA (Box and Jenkins, 1994), the classical linear model for station- 
ary time series, a multitude of extensions for the nonlinear and non-stationary 
case has been developed. Models for nonlinear time series can be divided into 
two groups: those modelling nonlinearity in the conditional mean, e.g. thresh- 
old auto-regressive model (Tong and Lim, 1980), and those modelling nonlin- 
earity in the variance, e.g. conditional heteroskedastic (CH) models including 
ARCH (Engle, 1982) and GARCH (Bollerslev, 1986) and their extensions that 
have been inspired by the particular behaviour of financial time series ("styl- 
ized facts") such as volatility clustering, heavy tails, asymmetry etc. General- 
izations for the non-stationary case include ARIMA and ARFIMA as well as 
IGARCH (Nelson, 1990) and FIGARCH BaiUie et al. (1996) models for (frac- 
tionally) integrated time series. Fan and Wenyang (2008) give an overview on 
the modern developments in the area of varying coefficient models originated 
by Hastie and Tibshirani (1993). 

However, the question of whether ubiquitous financial time series are nonsta- 
tionary, e.g. possess a unit root, or experience structural breaks has been ex- 
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tensively debated (Nelson and Plosscr, 1982), (Perron, 1989). It has been shown 
(Mikosch and Starica, 2004) that long range memory effects in financial time se- 
ries may be caused by structural breaks rather than result from nonstationarity. 
Further it has been argued (Diebold and Inoue, 2001) that structural breaks can 
easily be overlooked with negative impact on the quality of modelling, estima- 
tion and forecasting (Hillebrand, 2005). Perron (1989) introduced a model with 
breaks that arc exogenous, i.e. induced by external influence at known times, 
Zivot and Andrews (2002) suggested a sequential method of testing for endoge- 
nous breaks, i.e. not known beforehand. In this paper we also test for endogenous 
breaks sequentially and apply locally parametric estimation methods to sta- 
tionary subsets of the series. Methods of this kind have been presented e.g. in 
Fan and Gu (2003) for adaptive selection of the decay factor used to weight com- 
ponents of the pseudo-likelihood function, in Dahlhaus and Subba Rao (2006) 
for the formulation of the locally stationary ARCII(oo) processes, in Cheng et al. 
(2003) for locally choosing parameters of a filter. 

The mentioned locally parametric methods are the local change point (LCP) 
procedure (Mercurio and Spokoiny, 2004), the local model selection (LMS), also 
known as the intersection of confidence intervals (ICI) (Katkovnik and Spokoiny, 
2008), and the stagewise aggregation (SA) (Belomestny and Spokoiny, 2007). 
Until present the properties of these procedures were proven for the correspond- 
ing particular cases. In this paper we present a unified approach to adaptive 
estimation upon which the three aforementioned methods are based, treat the- 
oretical properties of the estimators in a general way way and give a universal 
procedure for the choice of parameters (critical values). 

Another focus of the paper is on the generality of the model. Modelling fi- 
nancial data involves the use of various distributions. For instance, the volatility 
model is applied to study the dynamics of squared returns, transactions on the 
market can be looked at as Bernoulli random events, time between transactions 
is usually assumed to follow the exponential or WcibuU distribution, and trans- 
action intensity can be described by the Poisson distribution. In this paper a 
rather general model allowing for an arbitrary distribution from the exponen- 
tial family is adopted. Models of this kind, however with accent of generalized 
linear models, have been used to study binary, categorical and count time se- 
ries (Fokianos and Kedem, 2003). 

The paper is organized as follows. Section 1 is devoted to the formulation of 
the problem in the context of univariate time series and theoretical introduction. 
Section 2 describes the local parametric methods. In the Section 3 the proce- 
dure to obtain critical values, essential parameters of the procedures is given. 
Section 4 collects theoretical properties of the procedures, including the main 
result, the oracle inequality, stating that the quality of the adaptive estimates 
provided by the procedures is comparable with the quality of the best possible 
estimate. Finally, Section 5 demonstrates the performance of the procedures on 
simulated and real data belonging to various distributions. 
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1. Model and set-up 

Consider a one-dimensional stochastic process Yt in discrete time t G N that 
is progressively measurable w.r.t. a filtration (Tt). We assume that the pro- 
cess Yt conditionally on the past has a distribution P^, from a given family of 
distributions V indexed by a parameter v. 

C{Yt\Tt-i)eV = {P„), veT. 

In the general case, the parameter u is a predictable stochastic process, leading 
to the following setup: 

Yt\Tt-, ^ P.,,. (1) 

Our goal is to infer on vt. 

1.1. Parametric and local parametric estimation and inference 

In the parametric case the parameter vt is assumed to depend on time through 
a time- varying function ft{6) of an argument 0: 

Vt - ft{e). (2) 

Then the estimation of vt based on the observations Yt can be carried out using 
the maximum likelihood method. The model (l)-(2) leads to the log-likelihood 
function 

t t 

where i is the log-density of the distribution family V. The estimate of the 
unknown parameter is then obtained by maximizing the log-likelihood function 
w.r.t. 9: 

Vt ~ ft{d) with 9 = argmaxi(0) = arg max i{Yt, ft{9)). 

t 

However, the parametric assumption is often too restrictive in the practice. 
Wc therefore follow the local parametric approach: it is supposed that for the 
time point of estimation there exists some interval / = [i* — Nj, t"] of length 
Nj within which the parametric regression function ft (9) describes the process 
adequately. We aim to determine that interval. 

1 . 2. Local constant approach for the exponential family 

Below we consider a special case of the parametric regression function, namely 
the local constant parametric one: 

ft{e) = e,tGi, 
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where both the length Nj of the interval and the value 9 are to be estimated 
from the data. The likelihood function then assumes the form 



tei 

Further, we limit the discussion to a particular {exponential) family of dis- 
tributions allowing for simple expression of the likelihood function and of the 
maximum likelihood estimate. Recall that distributions belonging to the expo- 
nential family possess the densities of the form 

piy, 9) = p(y)e^"(«)-^(''), e 6, y G 3^, (3) 

where v{9), B{9) and and p{y) are some given functions. 

One distinguishes between the natural and canonical parametrisations. A 
parameter 9 is called natural if it satisfies the equality 



^eY^ J yp{y,9)Pidy) 



for all 9 G Q. Hence, the local maximum likelihood estimate of the parameter 
is the average of the observations: 

9i = argmaxL/fe) = VFt/iV/. 
eee ^ 

tei 

A parameter v is canonical if the density p of any measure Py G V w.r.t. the 
dominating measure P can be represented as 

p{y.v)^^{y)^p{y)ey^-<^\ 

where d{v) is some convex function on T. Canonical parametrisation is con- 
venient due to the parameter entering the log likelihood linearly and allowing 
for mixing of distributions. This property is applied in the method of stagewise 
aggregation (see Section 2.3). 

For the exposition we will need a measure of difference between distributions 
Pg^ and Pg^ from the family V indexed by 9i and 02- One popular measure is 
the KuUback - Leibler divergence defined as follows: 

/C(0i,02) = E,, log^. 

dPe^ 

Its form for the distributions used in this paper is given in the Table 1. Although 
the form of the KuUback - Leibler divergence depends on the parametrisation, 
it is true that 

JCe{ei,92)^]CAv{9,),v{92)), 

for the natural parametrisation, /C^, (•, •) is the form for the canonical parametri- 
sation, and v{9) is the function giving the relation between canonical and natu- 
ral parameters. Due to this property, many results of this section hold for both 
parametrisations. 
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Model 




Gaussian regression 


(e - e'f/(2c7^) 


Bernoulli 


e \og{9/e') + {l-9) log{{l - 9)/{l - 9')} 


Poisson model 


e\og{9/9') ~{9 ~ 9') 


Exponential model 


e/e' - 1 - iog{0/e') 


Volatility model 


V2(e/9'-l)-V2log(e/9') 



Table 1 

Kullback - Leibler divergence for some distributions from the exponential family with 

natural parametrisation. 



As the following Theorem 1 asserts, the fitted likelihood L{6,d), i.e. the log 
ratio of the maximum value of the likelihood function to its value at an arbitrary 
point 9, is closely related to the Kullback - Leibler divergence. 

Theorem 1 (Polzehl and Spokoiny (2006)). For distributions from the ex- 
ponential family it is true that 

Li{0i,e) = maxL/(6'',6l) = L/(^/) - Li{e) = iV//C (6*/ , 6i) . (4) 

The following Theorem 2 claims that in the parametric case the estimation 
loss measured by Lj{9i, 9*) is with high probability bounded. 

Theorem 2 (Polzehl and Spokoiny (2006), Theorem 2.1). Let the obser- 
vations Yt be i.i.d. from Pg* , 9* e 6. Then for any 3 > 



P 



e'{Li{9j,9*)>i'j <2e-', 



where Po* is the joint distribution. 

Based on this result, one can bound the risk (expected loss) for a power loss 
function with the power r: 

Theorem 3. Let the observations Yf be i.i.d. according to Pg* . Denote as TZr.e-'j 
the risk of estimation based on the observations from the interval L : 



L 



(5) 



with some r > 0. Then the estimation risk is bounded: 

T^r,e*,i < tr, where Xr — 2r / i^^^e^^d^ — 2rr(r). 

Ji>0 

Proof. Denote the event {Lj{9i,9*) > 3} by A. Using the expectation repre- 
sentation formula, integrating by parts and applying Theorem 2 yields 

7^.,e^/ = - / fd-Pe-m = r f f-^Pe-Wi < 2r f f-^e-^d^. □ 

Ji>0 Ji>0 J}>Q 
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It follows from the Theorem 2 that if some is such that 2e ^° < a, then 

is the a-confidence set for the parameter 6. 
1.3. Nearly parametric case 

In practice the parametric assumption may be overly stringent and not hold 
even within an arbitrarily small interval. However, it turns out that results of 
Theorem 3 still hold to a certain extent even if the parametric assumption is 
violated. Deviation from the parametric case can be described by a magnitude 

tei 

which is in general random. The "nearly parametric" case occurs whenever the 
following condition is fulfilled: 

Condition 1 (Small modelling bias). For a given interval 1° there exists a 
parameter value 9° ^ Q such that the expectation under the true measure of the 
divergence Ajo{9°) is bounded by some A° > 0.' 

EA/o(6i°) < A°. 

The following theorem generalizes the result of the Theorem 3 to the nearly 
parametric case: 

Theorem 4. Let the small modelling bias condition hold. Then for r > 



Elog 1 



< 1 + A°, 



where TZr.0°.i° is defined analogously to (5). 

Proof. It suffices to apply Lemma 2 (information-theoretical bound) with 



c 



Ljo(^9joj9°) /T^r. 



•.s°,i° , 



P = P/o (true measure on the interval 1°), Pq = Pe° (parametric measure with 
parameter 9°) and employ that EqC = 1 by definition of TZr^e°,i°- O 

This result means that the risk of the maximum likelihood estimate in the nearly 
parametric case is comparable to that in the parametric case. The logarithm 
under the expectation is induced by the information-theoretical bound, and 
A° can be interpreted as payment for violation of the parametric assumption. 
Theorem 4 leads to the notion of the oracle estimate as the one corresponding 
to the largest interval under the small modelling bias condition. Performance of 
the oracle estimate will be the basis for measuring the performance of adaptive 
estimates in the following sections. 
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Fig 1. Nested intervals. 

2. Adaptive estimation methods 

In the Theorem 4 the oracle estimate has been defined as the estimate cor- 
responding to the largest interval under the small modelling bias condition. 
Unfortunately, since the oracle depends on the unknown true measure, it is not 
available and can only be mimicked. Based on a sequence of growing intervals 
and associated estimates, we aim to construct an adaptive estimate matching 
the oracle quality. Let denote the time point at which the parameter v is to 
be estimated, and {Ik}k=i be a sequence of growing intervals of length Nk with 
the common right edge at t* (Fig. 1): Ik = — Nk, We associate with each 
interval Ik from this sequence the corresponding maximum likelihood estimate, 
which we shall call weak estimate. To simplify the notation, in the sequel we de- 
note 6k = 9if,, Lk(9i,92) = L/j. (01,^2) etc. The model selection-type approach 
to adaptive estimation consists in picking the estimate corresponding to the 
largest interval satisfying a certain predicate A together with all of the smaller 
intervals: 

e = Of. with k = max{fc : A{k) = 1 and A{k - 1) = 1} (6) 

where A{1) = 1 by definition. Below we present two model selection- type pro- 
cedures specifying the predicate A. 

In contrast, the aggregation approach gradually builds up the adaptive esti- 
mate by taking convex combinations of the weak estimates. It will be shown that 
adaptive estimates obtained by all three procedures exhibit the risk comparable 
to the oracle risk. 

Note that all procedures arc pointwise, i.e. they provide an estimate 9 — 9{t) 
for every time point t. 

2.1. Local change point detection 

The method of local change point (LCP) detection introduced in Mercurio and Spokoiny 
(2004) is a procedure for discovering a change point within an interval provided 
there can be at most one such point. Suppose that an interval I" ~ (t, t*] has 
been found to contain no change points (Fig. 2). To test the null hypothesis 
about no parameter change occurring at the point r, we take an interval I' of 
roughly the same length as I" and use as a test statistic the difference between 
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I' T 



Fig 2. Intervals involved in the change point detection procedure. 

the sum of log likelihoods corresponding to the intervals I' and I", and the 
log likelihood corresponding to the interval T = T' U X" containing no change 
points: 

Ti^r = m&x{Lx„{e") + Li,{e')) - maxLi{0) = Lj,{ei,) + Lx„{ei„) - Lj(^i), 

9' .6" 6 

(7) 

where L(-) denotes the log likelihood function. Under the null hypothesis the 
test statistic should not exceed a certain critical value 3. Failure to reject the 
null hypothesis implies that r is not a change point, hence the interval 2" can 
be extended by including r. 

To test the interval Ik provided Ik-i contains no change points, we let X = 
Ik+i and accept Ik if the above null hypothesis cannot be rejected for every 
point T e Ik\Ik-i (Fig- 2). Therefore, the predicate A from (6) assumes the 
form 

Aik) = l{T7,_^i,, < ik for cachr e Ik\Ik-i}. 

Since an enclosing interval is necessary for testing, the largest interval that can 
be tested by LCP is Ik-i- 

2.2. Local model selection 

The local model selection procedure goes back to Lepski (1990), see also Spokoiny and Vial 
(2008). Given a sequence of accepted weak estimates 9i. . . 9k-i, the procedure 
tests a candidate estimate 9k ■ The candidate estimate is accepted if it belongs 
to the condifence interval £ of each of the previous weak estimates (cf. Figure 3). 
Formally, the predicate A for LMS has the form 

A{k) = l{9k e £1 for aU / < k}. 

The procedure for constructing the confidence intervals of the weak estimates is 
provided by the corollary of the Theorem 2 that claims £(3c<) — ^9 : L{9,9) < 
to be the a-confidence set for the parameter 9. 
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^^4(34) 



^^3(33) 



Fig 3. Principle of the local model selection. 8 = 63. 



2.3. Stagewise aggregation 

The SA procedure introduced in Belomestny and Spokoiny (2007) differs from 
the two methods described above in that it does not choose the adaptive estimate 
9 from the set of weak estimates di . . . Ok- Instead, based on the weak estimates, 
it sequentiaUy constructs aggregated estimates 9i . . .Ok possessing the property 
that any aggregated estimate Ok has smaller variance than the corresponding 
weak estimate Ok , while keeping "close" to it in terms of the statistical difference, 
the latter being measured through the fitted likehhood Lk{0k,0k-i) = Lk{Ok) — 
Lk{Ok-i). The adaptive estimate is finally taken equal to the last aggregated 
estimate: ~ Ok, unless the early stopping (see below) occurs. 

Formally, the first aggregated estimate is equal to the first weak estimate 
and every next aggregated estimate is a convex combination of the previous 
aggregated estimate and the current weak estimate: 



Oi, k = l 

-fJk + {l-lk)Ok-i, 



k = 2,...,K 



(8) 



Here 7^ is the mixing coefficient that reflects the statistical difference be- 
tween the previous aggregated estimate Ok-i and the current weak estimate Ok, 
and is obtained by applying an aggregation kernel A'ag to the fitted likelihood 
Lk{0k,0k-i) scaled by the critical value ik '■ 



Ik 



ag 



Lk{0k,0k-i) 

ik 



The aggregation kernel acts as a link between the likelihood ratio and the 
mixing coefficient. The kernel is selected to ensure that a smaller statistical 
difference between Ok and Ok-i leads to the mixing coefficient close to 1 and 
thus to the aggregated estimate Ok close to Ok, while a large difference should 
provide the mixing coefficient close to zero and thus keep Ok close to Ok-i- 
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Whenever the difference is very large, the mixing coefficient is zero, and the 
procedure stops prematurely by setting 6 ~ Ok-i- Wc call this situation early 
stopping. 

To satisfy the mentioned requirements, the kernel must be supported on the 
closed interval [0, 1] and monotonously decrease from 1 on the left edge to on 
the right edge and have a plateau of size b at the left. Thus, the aggregation 
kernel assumes the form: 



^ag(w) 



1, _ 0<u<b 
l-^ag(it), b<u<l 



Examples of Kg_g{u) include (triangular kernel), (^f^j (Epanechnikov 
kernel) etc. 

3. Critical values and other parameters 

All procedures described above depend on a set of parameters ji . . .j/f that 
we call critical values. The critical values reflect the problem design (interval 
length, model, method etc.). The principle behind their selection is to ensure 
that the probability of error of the first kind under the null hypothesis (in 
the parametric case) does not exceed a prescribed level, while errors on initial 
estimation steps (for small k) are penalized stronger. To formalize this idea, we 
introduce the notion of the adaptive estimate obtained on the k-th step 9k for 
model selection- type procedures (for the case of SSA see (8)): 



7fc = 




A{1) = 1 for all I < k, 
otherwise. 



The selection of the critical values is based on the following propagation condi- 
tion, that postulates the performance in the parametric case: 

Condition 2 (Propagation). Let the observations be i.i.d. fromPg* with 9* € 
and a be a constant, < a < 1 . Then the risk of the adaptive estimate in the 
parametric case 9k attained on the k-th step must be bounded: 

where TZr,e*,k is the parametric risk defined analogously to (5) with I ^ Ik- 

In the parametric case the adaptive estimate on the fc-th step should be equal to 
the oracle estimate: 9k ~ 9k. Otherwise the false alarm has occurred, incurring 
the loss Lk{9k,9k). The propagation condition stipulates that the risk (mean 
loss) associated with the /c-th adaptive estimate does not exceed a certain frac- 
tion of the parametric risk. The constant a plays the role of the level of the 
procedure. 
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However, the propagation condition is not explicit. We use the following se- 
quential method for the computation of critical values using Monte-Carlo sim- 
ulations. Denote as Oi{ik) ior I > k the adaptive estimate obtained after the 
l-th step of the procedure run with the critical values 31, . . . ,3fc-i known and 
3fc+i, . . . ,3/f set to infinity: 

diiu) = (^liSi, ■ ■ • ,3fc,3fc+i = 00, . . . ,3/f = 00), I > k. 
The first critical value can be selected to satisfy the conditions 



Efl 



< — , 1 = 2,. 



.,K. 



Such a value exists, since for 31 taken sufficiently large the weak and adaptive 
estimates coincide for any / and all Monte-Carlo paths from Fg* , leading to 
the zero risk. With the first fc — 1 critical values fixed, the A:-th critical value is 
selected using the condition 



Efl 



7^, 



< a- 



r,e',l 



k 



l = k + l, 



Obtaining critical values is computationally involved, however for a given 
parametric setup it is sufficient to compute them once under the parametric 
model and reuse later. Critical values for various models are given in the Sec- 
tion 5.1, where the influence of the parameters a (test level) and r is discussed 
as well. 



4. Theoretical properties of adaptive estimates 

In this section the estimates obtained using the methods described in the pre- 
vious sections are shown to possess some remarkable properties. Namely, 

(a) in the local parametric case (under the small modelling bias condition) 
the difference between the oracle estimate and the adaptive estimate on 
each step is only within a small factor of the corresponding difference 
in the global parametric case, and the factor is due to the "payment for 
adaptation" (propagation result); 

(b) the quality of the adaptive estimates does not deteriorate upon violation 
of the small modelling bias condition (stability result); 

(c) the risk between the final adaptive estimate and the oracle estimate is 
bounded (oracle result). 

Some of these results have been obtained previously for particular methods, 
e.g. in Katkovnik and Spokoiny (2008) for LMS, Mercuric and Spokoiny (2004) 
for LCP and Belomestny and Spokoiny (2007) for SSA. Here they are reformu- 
lated for the general case of any of the three procedures. 

We introduce two regularity conditions needed for the formulation of the 
claimed properties. 
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Condition 3 (Exponential growth of the intervals). For some constants 
Uq, u s. t Uo < u < 1 the interval lengths Nk satisfy for every 2 < k < K 

Condition 4 (Compactness of the parameter set). There exists a value 
a > 1, s. t. for any two parameter values 6i, 62 (z Q 

where d is the function from the definition of the exponential family, see Sec- 
tion 1.2. 

Definition (Oracle index). An oracle index is the number k° of the largest 
interval satisfying Condition 1 with 1° = /^o for some 6° Cz & and A° > 0. 



4.1- Propagation property 

The following result, implied by the Theorem 4 and Condition 2, states that 
under the small modelling bias condition the performance of the adaptive pro- 
cedures is essentially the same as in the parametric case. 

Theorem 5 (Propagation property). Assume the regularity Conditions 3 
and 4- Then under Condition 1 for any r > 



and 



Eloff 1 



E log 1 



7^ 



r,9° ,k° 



7^. 



<a + A° 



< 1 + A° 



4-2. stability after propagation 

The next result, following from the definition of the LMS procedure, claims 
that the quality of the adaptive estimate does not deteriorate with the growing 
interval even when the small modelling bias condition does not hold anymore. 

Theorem 6 (Stability after propagation for LMS). For k > k° , where k° 
is the oracle index, the following holds: 

Lk'>{9ko ,9) = Lk''{9ko ,0f,) < ^k° ■ (10) 
Analogous result for the LCP method is established in the Theorem 7. 
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Theorem 7 (Stability for LCP). It holds for every k < K 

NklCiOkJk+i) <2k- (11) 
Moreover, if Conditions 3 and 4 hold, then for every k < k' < K 



2 



NklCi0k,Ok') < (7—7^) h, (12) 



where ik = maxi>fc 



Proof. If Ik+i is rejected, then 9k+i ~ Ok and the assertion (11) trivially fol- 
lows. Whenever Ik+i is accepted, it holds that 6k = Ok and 6*^+1 = Ok+i- The 
acceptance of Ik implies by definition of the procedure that T/^. < 3^ and, in 
particular, T/^,i- < ik with r = — Nk being the left edge of Ik- Due to the 
definition (7) of the test statistic and taking into account the representation (4) 
of the fitted likelihood for exponential families it follows that 

Lk{Ok A+i) ^ NkJCiOkA+i) < Ik ■ 

That proves the assertion (11). 

Decomposing the KuUback - Leibler divergence according to Lemma 3 (which 
requires Condition 4) and applying the just proven assertion (11) yield 

fc'-i fe'-i . — _ 

lc''K0k,0k') < /cV^(^„^,+i) < 

j=k ]=k V ^ 

Use of Condition 3 leads to the bound 

j=fc ^ ' ■> ' j=k 

The last expression is a partial sum of the geometric series with common ratio 
y/u. < 1. Therefore, it can be bounded from above by the total sum: 



-k'-l 

Sk 
Nk 



Consequently, 

(NklCiOkJk')] ' < — ^^/|^. 

Squaring both sides leads to the assertion (12). □ 

The stability result for the stagewise aggregation method is formulated in the 
Theorem 8. 
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Theorem 8 (Stability for the SSA). It holds for every k < K 

iVfc/C(4,4-i) <3fc. (13) 
Moreover, if Condition 3 holds, then for every k < k' < K 

NklC{ekJk') < a'^clu, (14) 

where Cu = 1/(1 — Vu) '^^'^ 3fc ^'^■^ been defined in Theorem 7. 

Proof. Convexity of the KuUback - Leibler divergence IC{u,v) w.r.t. the first 
argument imphes 

< IkHhJk-i) + il-lkWk-iJk-i) = jklCi9kJk-i)- 

If /C(6'fe, 0fc_i) > ik/Nk, then 7^ = and (13) follows. Assertion (14) is proven 
exactly as the analogous assertion (12). □ 



4-3. Oracle result 

Finally, we show that the quality of adaptive estimates is comparable with that 
of the oracle estimate. In the following theorems we assume the small modelling 
bias condition (Condition 1) as well as regularity conditions (Conditions 3 and 4) 
to hold. 

We begin with the oracle property of the LMS estimates. 
Theorem 9. For any r > the LMS adaptive estimate 6 satisfies 

Lk° {Ok° , I 



Elog 1 + 



< A°+ar,/3^o +1. 



Proof. Definition of the LMS adaptive estimate 9 ~ 9-^, and the stability prop- 
erty (10) imply 



Lk°{9ko,9) — Lko{9ko,9ko) + Lko{9ko ,9ko) l{fc > < Lk^ifik" ,9ko) 



By Condition 2 and Theorem 3 



Lk° {Ok" , Sk° ) 



By Lemma 2 



Elog 1 



Lko {9ko , 9) 



lk° 



and the assertion follows. 



Efl 



< A° 



^k° \Ok° 



(15) 



< A° + ^ + l, 

ik° 



□ 
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Since the decomposition (15) is only possible for the LMS estimates, the 
oracle result for the LCP and SSA estimates is somewhat weaker. It is formulated 
in the following theorem. 

Theorem 10. The estimate 6 is close to the oracle estimate O^o in the sense 
that 



Eloff 1 + 



<a + A°, 



NkolCiOkoJ) < a^cluo, 

where is the parametric risk bound from the Theorem 3. 
We prove the corollary for the case of r = 1/2. 

Theorem 11. Assume regularity Conditions 3 and 4- Then under Condition 1 
the following holds: 



Elog 



v 



V2 



an, 



< log 1 



V2 



"y2 



A° + a + l. 



(16) 



where and has been defined in Theorem 8. 
Proof. By Lemma 3 and stability property (12) 



V2 



< 



y2 



7Vfc=/c(0fco,0) + Nk^ic{eko,eko) + Nkoic{9ko,e°) 



y2 

Lk°{Sko,9k'=) + Lko{9k° ,0°) 



Further, it is true that for all a,b > 

log(l + a + b) < log(l + a) + log(l + b). 
Substituting (17) into (16) and applying the inequality (18) yields: 



y2 

(17) 
(18) 



Elog 1 + 



an, 



y2 



< log 1 



^y2 



Elog(l + C) 



with C = ri/ Lk 









+ 


Lk" {Ok", 6°) 





Applying Lemma 2 and es- 



timating Ego^ in view of the propagation condition (9) and Theorem 3 complete 
the proof. □ 
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Fig 4. Critical values for the standard setup (see text). True parameter value, if relevant, is 
indicated in parentheses after the model designation. 



5. Simulations and applications 

5. 1 . Influence of the parameters on the critical values 

From the propagation condition (Condition 2) is follows that the critical values 
depend on global parameters a (level of the procedure) and r as well as on the 
underlying family of distributions and on the sequence on intervals generating 
weak estimates. In this section the influence of the mentioned parameters on 
the critical values is investigated. 

Figure 4 shows the critical values for the standard setup: a ~ 0.7, r = 0.5, 
interval length from 5 to 284 points growing exponentially with the factor 1.4. 
Other figures in the section show the critical values for the setups other than 
the standard one. In every case only the indicated parameter is being changed, 
the other remaining at the standard settings. For the Gaussian and volatility 
models the critical values do not depend on the unknown true parameter 9* as 
stated by the following lemma. 

Lemma 1 (Pivotality of the critical values). Let the observations Yt be i.i.d. 
from Pgt with 6* € 0, where P describes either Gaussian or volatility model. 
Then the critical values obtained using the propagation condition ( Condition 2) 
do not depend on 9* . 

Proof. It suffices to show the pivotality of the fitted likelihood Lk{9k,9k) = 
NkJC{9k,9k) entering Condition 2. In the rest of the proof we drop the index k. 
Observations originating from the Gaussian model are of the form 

Yt^9*+aet, ~ N(0, 1), 

where cr is a known constant. Hence, weak estimates assume the representation 
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Fig 5. Critical values corresponding to 
various test levels (model: Volatility, 
method: LCP, standard intervals). 



Fig 6. Critical values corresponding 
to various r levels (model: Volatility, 
method: LCP, standard intervals). 




Fig 7. Critical values for various 6* 
(model: Bernoulli, method: LMS, stan- 
dard parameters and intervals) . 



Fig 8. Critical values for various 9* 
(model: Poisson, method: LMS, standard 
parameters and intervals). 
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as a sum of 9* and a further item independent of 6*: 

N N 
t=l t=l 

This is also true for the adaptive estimate 9 by construction of the procedures 
LCP and LMS and can be shown to hold for SSA by induction. Since the 
Kullback - Leibler divergence K. {61,62) for the Gaussian model depends only 
on the difference of the arguments (cf. Table 1), the true parameter value 9* 
cancels. 

Observations following the volatility model assume the form 
Yt^9*el £t^N(0,l), 

and the weak estimates can be represented as a product of the true parameter 
value and a further term: 

N ^ 

t=i 

with the same holding for adaptive estimates. As the Kullback - Leibler diver- 
gence K. (6*1 , 62) for the volatility model is a function of the ratio of the arguments 
(cf. Table 1), 6* cancels. This concludes the proof. □ 

On the Figures 7 and 8 several sets of the critical values corresponding to various 
values of 9* are presented. Dependence of the critical values on 6* is not very 
strong and it is sufficient in practice to compute one set of critical values for a 
certain 9*, e.g. 9* = 0.5 in the case of Bernoulli model. 



5.2. Delay of detection of a jump 

A jump in the the parameter of the underlying distribution cannot be detected 
immediately, but after a delay at least as large as the smallest interval. The 
subject of this section is to investigate this delay. Consider a sequence of the 
true parameter values 9* exhibiting a jump of height h from a value to a 
value 9 = 9 + h a,t time (Figure 9). We regard the jump as detected once the 
parameter estimate has reached the (5-fraction of the difference between 6 and 
£ for some < S < 1 : 

t = min{t: 9t > 6 + S(6 - 9)}. 

We present the mean delay t — of 1000 replications (Figure 10) as a function 
of 9 for various distributions and S = 0.7 . As one would await, smaller jumps 
require more time to be detected. 



5.3. Switching regime model 

In the present section adaptive methods are applied to data following a switch- 
ing regime model. The latter is described by a sequence of Markov moments 
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Fig 9. Design of the jump experiment. 




0.5 0.6 0.7 0.8 0.9 



10 15 20 



2 4 6 8 10 

Parameter 



10 15 20 500 1000 1500 



Fig 10. Mean delay of detecting a jump for various distributions (the number in parentheses 
isO ), S = 0.7. 



{vi} = {40,99,85,99,51,38} w.r.t. the filtration (J-t), and a sequence of states 
{Gui} with {ui} = {2, 4, 3, 5, 2, 1}. The states for various models are listed in the 
Table 2. Figure 11 illustrates estimation by different methods based on 10000 
realizations of data originating from four distributions. One can recognize that 
the methods in use demonstrate very reasonable performance. The adaptive es- 
timate is most of the time within the oracle confidence bounds. Note that oracle 
confidence bounds rcfiect the jumps of the true value with some delay, leading 
to skewness of the "wavcfronts" . This is due to the fact that by construction of 
the estimation procedures the smallest interval is always accepted, delaying the 
oracle. 

The estimation is characterized numerically in terms of the mean squared 
error, KuUback-Leibler divergence and mean absolute error in the Table 3. 

Performance of the LCP and LMS is very close, while the SA works somewhat 
worse on data with abrupt jumps. 

5.4- Application to volatility estimation 

In this section the performance of the procedures is demonstrated by means of 
applying them to the problem of volatility estimation of the daily stock price of 
Alfianz in the period from 1974-01-02 to 1996-12-30. 
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100 200 300 400 



Fig 11. Estimation for selected combinations of method and model. Thin solid line: true 
value; shaded area: 95% confidence region for the oracle estimate; thick solid line: median 
estimate; dashed lines: 1st and 3rd quartiles. 



Model 


1 


2 


3 


4 


5 


Gaussian 


0.00 


1.00 


1.41 


1.73 


2.00 


Poisson 


1.00 


2.36 


3.15 


3.85 


4.51 


Bernoulli 


0.01 


0.42 


0.65 


0.79 


0.87 


Volatility 


1.00 


6.31 


19.06 


53.59 


147.41 



Table 2 

States of the switching regime model for various distributions. 



imsart-ejs ver. 2008/01/24 file: ejs_2008_336.tex date: December 2, 2008 



M. Elagin/ Locally adaptive estimation methods 



20 





Method 


MSqE 


KLD 


MAE 


Volatility 


LCP 
LMS 

SSA 


567.66 
674.27 
1101.21 


0.07 
0.08 
0.14 


9.08 
12.02 
16.48 


Gaussian 


LCF 
LMS 

SSA 


0.08 
0.07 
0.09 


0.04 
0.03 
0.04 


0.17 
0.18 
0.20 


Poisson 


LCP 
LMS 
SSA 


0.23 
0.20 
0.27 


0.04 
0.04 
0.05 


0.29 
0.31 
0.35 


Bernoulli 


LCP 
LMS 
SSA 


0.02 
0.01 
0.04 


0.11 
0.05 
0.15 


0.08 
0.08 
0.14 



Table 3 

Estimation quality in terms of the mean squared error, Kullback-Leibler divergence and 
mean absolute error. Based on 10000 replications. 



Let St be an observed stoek price process, and Yt ~ \^og j the squared 

log returns (Figure 12). The latter are described by the conditional heteroskedas- 
tic model 

Yt = dtel et-N(0,l), 

which is a particular case of the general model of Section 1. The problem is to 
forecast 9t+h from Yi, . . . ,Yt for the time horizon h. The estimation is carried out 
using the procedures introduced above. The criterion to describe and compare 
the performance of the procedures is the median /i-step-ahead forecasting error 
defined as 

FE,, = mcdian[/C(rt,,„^t)], 

where Ytji = "l^k' <h^t+h' ^^'^ horizon h is taken equal to 1, 3 and 10 
steps. The results are shown in the Figure 13. All procedures demonstrate com- 
parable and very reasonable performance. The stagewise aggregation procedure 
performs somewhat better than the other methods, in contrast to the switching 
regime model of Section 5.3. 

5.5. Application to waiting time estimation 

With the advent of fast and cheap computers equipped with large storage high 
frequency financial data became available, opening new perspectives for financial 
time series analysis such as the study of bid-ask spread, transaction intensity, 
waiting times etc. In this section wc apply adaptive methods to the estimation 
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Fig 12. Price and log returns of the Allianz stock. 
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Fig 13. Forecasting error of the Allianz stock price volatility estimation. 
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Fig 14. Upper panel: waiting times between transactions (BHP Billiton stock on the Aus- 
tralian Stock Exchange on July 5, 2002); lower panels: corresponding adaptive estimates. 

of waiting times, i.e. time intervals between transactions. Waiting times are 
assumed to follow the exponential distribution with the density 




For an example we use the data on the BHP Billiton stock traded on the Aus- 
tralian stock exchange (ASX) from July 1, 2002 to August 30, 2002. Figure 14 
displays waiting times within one day as well as the estimates of 9. Time inter- 
vals are measured in ticks with the tick size of 0.00390625 seconds. As in the 
section 5.4, the median forecasting error is used as the performance criterion 
with the horizon h = 10. The forecasting error of the three methods arc pre- 
sented in the Figure 15. The LMS method appears to perform best in this case. 



Conclusion 

In the paper a unified approach to the adaptive estimation of the univariate time 
series parameters have been presented and specified for two model-selection type 
methods and one aggregation method. The approach has been applied to Gaus- 
sian, volatility, Poisson, exponential and Bernoulli models. A universal proce- 
dure for the choice of critical values has been developed. The influence of global 
parameters such as the test level on the critical values has been investigated. 
Theoretical properties of the procedures have been investigated in the most uni- 
fied way possible so far. It is hardly possible to pick the best method among the 
considered ones. With a good choice of critical values all methods considered 
demonstrate very reasonable performance on simulated and real-life data. 
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Fig 15. Median forecasting error of the estimation of times between transactions involving 
the BHP Billiton stock on ASX. 



Appendix 

Lemma 2 (Information-theoretical bound). Let P and Pq he two mea- 
sures such that the KuUback - Leibler divergence Elog((iP/(iPo), satisfies 

Elog(dP/dPo) < A < oo. 

Then for any random variable C with EqC < oo 

Elog(l + C) < A + EoC- 

Proof. One can check that for any fixed y the maximum of the function f{x) = 
xy — X log X -\- X is attained at x = . Substituting x = e'^ one obtains the 
inequality xy < a; log a; ~ x + e^ . Using this inequality and the representation 
E log (1 + C) = Eo {Z log (1 + C)} with Z = dP /dPo, one obtains 

Elog(l + C) =Eo{Zlog(l + C)} 

<Eo(ZlogZ-Z) + Eo(l + C) 
= Eo log Z) + EoC - EqZ + 1 . 

It remains to note that EqZ = 1 and Eo log Z) = E log Z = E log (dP /rfPo) < 
A. □ 

Lemma 3. (Polzehl and Spokoiny, 2006, Lemma 5.2) For distributions from 
the exponential family with the functions B{9) and C{9) (see (3)) continuously 
differ entiable on Q, where Q satisfies Condition Jf., it holds for every 0q,9i,62 € 

, , , , 

V'/C(0i,02) < a(V/C(0i,0o) + ^-^(^2,^0) 
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Moreover, for any sequence 6o,6i, . . . ,6m 

^JC{eo, 9rn) < a (v//C(0o, ^l) + • • • + V>^iSm-l,Orn)) • 
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